Convex Optimization 2015.11.18

Let \(\{ f_a : a \in \mathcal{A} \}\) be a collection of convex functinos from \(\mathbb{R}^n\) to \(\mathbb{R}\), with same domain, then \(f(x) = \sup_{a \in \mathcal{A}} f_a(x)\) is a convex function.

  • Proof1: Take \(x, y \in dom \, f\), \(\theta \in [0, 1]\),

    \[\begin{align*} f(\theta x + (1 - \theta)y) = & \sup_a f_a(\theta x + (1 - \theta)y) \\ \le & \sup_a [\theta f_a(x) + (1 - \theta) f_a(y)] \\ \le & \theta \sup_a f_a(x) + (1 - \theta) \sup_a f_a(y) \\ = & \theta f(x) + (1 - \theta) f(y) \end{align*}\]

    (The proposition is true for \(dom \, f = \bigcap _{a \in \mathcal{A}} dom \, f_a\), but false for \(dom \, f = \bigcup _{a \in \mathcal{A}} dom \, f_a\).)

  • Proof2: \[\begin{align*} epi(f) = & \{ (x, t) : x \in dom \, f, \, t \gt f_a(x) \; \forall a \in \mathcal{A} \} \\ = & \{ (x, t) : (x, t) \in epi(f_a) \; \forall a \in \mathbb{A} \} \\ = & \bigcap _{a \in \mathcal{A}} epi(f_a) \end{align*}\]

  • Example: Let \(x_{[i]}\) denote the \(i\)-th largest component of \(x = (x_1, \cdots, x_n) \in \mathbb{R}^n\),
    then \(max-sum_r(x) = x_{[1]} + x_{[2]} + \cdots + x_{[r]}\) is convex.

  • Proof: \(max-sum_r(x) \ge x_{i_1} + x_{i_2} + \cdots + x_{i_r}\)

    for any \(\{ i_1, \cdots, i_r \} \subseteq \{ 1, \cdots, n \}\) with \(i_j \neq i_k\) for \(j \neq k\)

    so it is convex.

  • Example: Let \(C \subseteq \mathbb{R}^n\), define \(S_c(x) = \sup \{ y^T x : y \in C \}\), then \(S_c\) is convex.

  • Example: Let \(f : S^n \to \mathbb{R}\), \(f(X)\) is the largest eigenvalue of \(X\).

    Claim: \(f\) is convex.

  • Proof: First claim that \(f(X) = \sup \{ y^T X y : \lVert y \rVert _2 = 1 \}\)

  • Proof of claim: \[\begin{align*} \sup _{\lVert y \rVert _2 = 1} y^T X y = & \sup _{\lVert y \rVert _2 = 1} y^T P D P^T y \\ = & \sup _{\lVert v \rVert _2 = 1} v^T D v \\ = & \sup _{\lVert v \rVert _2 = 1} \lambda _i v_i^2 \\ \le & \sup _{\lVert v \rVert _2 = 1} \max (\lambda _i) \sum_{i = 1}^n v_i^2 \\ = & \max (\lambda _i) \end{align*}\]

  • Example: Let \(f : \mathbb{R}^{m \times n} \to \mathbb{R}\) be defined by \(f(X) = \lVert X \rVert _2\) where \(\lVert X \rVert _2 = \sup _{\lVert y \rVert _2 = 1} \lVert X y \rVert _2\) is the spectural norm of \(X \in \mathbb{R}^{m \times n}\).

  • Claim: \(f(X) = \sup _{u, v} \{ u^T X v : \lVert u \rVert _2 = 1, \lVert v \rVert _2 = 1 \}\)

    because \(\lVert Xv \rVert _2 = \sup \{ u^T X v : u \in \mathbb{R}^n, \, \lVert u \rVert _2 = 1 \}\)

    more generally: \(\lVert X \rVert _{a, b} = \sup \{ \lVert Xv \rVert _b : \lVert v \rVert _a = 1 \} = \sup \{ u^T X v : \lVert v \rVert _a = 1, \lVert u \rVert _{b*} = 1 \}\)

    (\(\lVert Xv \rVert _b = \lVert Xv \rVert _{b**} = \sup _u \{ u^T X v : \lVert u \rVert _{b*} = 1 \}\))

  • Composition: Have \(f(x) = h(g(x))\), \(x \in \mathbb{R}^n\), \(g(x) \in \mathbb{R}\), when is \(f\) convex?

  • Would-be-proof: Take \(x, y \in dom \, f\), \(\theta \in [0, 1]\) then

    \[\begin{align*} f(\theta x + (1 - \theta)y) = & h(g(\theta x + (1 - \theta)y)) & \text{use $dom \, f$ is convex} \\ \le & h(\theta g(x) + (1 - \theta)g(y)) & \text{use $g$ is convex/concave, $h$ is nondecreasing/nonincreasing} \\ \le & \theta h(g(x)) + (1 - \theta)h(g(y)) & \text{use $h$ is convex} \\ = & \theta f(x) + (1 - \theta)f(y) \end{align*}\]

--- - 1 2 3 4
Condition g convex concave convex concave
--- h nondecreasing nonincreasing nonincreasing nondecreasing
--- h convex convex concave concave
Result f convex convex concave concave
  • Example: \(g(x) = x^2 - 1, h(x) = x^{3 / 2}, dom \, h = \mathbb{R}_+\)

    then \(dom \, (h \circ g) = (-\infty, -1] \bigcup [1, +\infty]\) is not convex.

  • Example 3.13:

    If \(g\) is convex then \(e^{g(x)}\) is convex. 1

    If \(g\) is concave and positive then \(\log(g(x))\) is concave. 4

    If \(g\) is concave and positive then \(\frac{1}{g(x)}\) is convex. 2

    If \(g\) is convex and nonnegative and \(p \ge 1\), \(g(x)^p\) is convex. 1

    \(g : \mathbb{R}^n \to \mathbb{R}^m\), say \(g\) is K-convex, where \(K\) is a cone in \(\mathbb{R}^m\),

    if \(dom \, g\) is convex and \(g(\theta x + (1 - \theta)y) \preceq _K \theta g(x) + (1 - \theta) g(t)\)

    \(x \succeq _K y \implies h(x) \ge h(y)\) K-nondecreasing.

  • Example 3.14:

    \(h(z) = \log \left(\sum _{i = 1}^k e^{z_i} \right)\), so \(\log \left( \sum _{i = 1}^k e^{g_i(x)} \right)\) will be convex if \(g_1, \cdots, g_k\) are convex.

  • Minimization: Let \(f : \mathbb{R}^n \times \mathbb{R}^m\) be a convex function, then

    \(g(x) = \inf _{y : (x, y) \in dom \, f} f(x, y)\) is convex.

  • Proof: Let \(x_1, x_2 \in dom \, g\), \(\theta \in [0, 1]\).

    then for any \(\varepsilon \gt 0\), \(\exists y_1, y_2\), s.t.

    \(g(x_1) \ge f(x_1, y_1) - \varepsilon\), \(g(x_2) \ge f(x_2, y_2) - \varepsilon\) and

    \[\begin{align*} g(\theta x_1 + (1 - \theta)x_2) = & \inf _y f(\theta x_1 + (1 - \theta)x_2, y) \\ \le & f(\theta x_1 + (1 - \theta)x_2, \theta y_1 + (1 - \theta)y_2) \\ \le & \theta f(x_1, y_1) + (1 - \theta) f(x_2, y_2) \\ \le & \theta g(x_1) + (1 - \theta) g(x_2) + \varepsilon \end{align*}\]

  • Example: Let \(C \subseteq \mathbb{R}^n\) be a convex set, then

    \(g(x) = \inf _{y \in C} \lVert x - y \rVert\) is a convex function.

  • Proof: Use \(f(x, y) = \lVert x - y \rVert \; dom \, f = \mathbb{R}^n \times C\),

    \[\begin{align*} f(\theta x_1 + (1 - \theta)x_2, \theta y + (1 - \theta)y_2) = & \lVert \theta x_1 + (1 - \theta)x_2 - \theta y_1 - (1 - \theta)y_2 \rVert \\ = & \lVert \theta (x_1 - y_1) + (1 - \theta)(x_2 - y_2) \rVert \\ \le & \theta \lVert x_1 - y_1 \rVert + (1 - \theta) \lVert x_2 - y_2 \rVert \\ = & f(x_1, y_1) + f(x_2, y_2) \end{align*}\]

    Consider function \(g(w) = \inf _x \sum _{i = 1}^m w_i (a_i^T x - b_i)^2\), "weighted least square"

    concave function of \(w\)

    Let \(g(w) = \inf _x (Ax - b)^T W (Ax - b) = \inf _x (x^T A^T W A x - 2b^T W A x + b^T W b)\)

    assume \(A^T W A \succ 0\), then optimal \(x = (A^T W A)^{-1} A^T W b\)

    (\(\min _x(x^T A x + 2b^T x), \nabla = 2Ax + 2b = 0 \implies \text{best} \, x = -A^{-1} b\))

    \[\begin{align*} \text{optimal value} = & b^T W A (A^T W A)^{-1} A^T W b - 2 b^T W A (A^T W A)^{-1} A^T W b + b^T W b \\ = & - b^T W A (A^T W A)^{-1} A^T W b + b^T W b \\ = & \sum _{i = 1}^m b_i^2 w_i - \left(\sum _{i = 1}^m w_i b_i a_i^T \right) \left(\sum _{i = 1}^m w_i a_i a_i^T \right)^{-1} \left(\sum _{i = 1}^m w_i b_i a_i \right) \\ = & \sum _{i = 1}^m b_i^2 w_i - \sum _{i, j} w_i w_j b_i b_j a_i^T \left(\sum _{i = 1}^m w_i a_i a_i^T \right)^{-1} a_j \end{align*}\]

    (\(A^T W A = \sum a_i (w_i a_i^T) = \sum w_i a_i a_i^T\))